Standards in Genomic Sciences (2013) 9:264-272 



DOl:10.4056/sigs.4508258 



Genome sequence of the South American clover- 
nodulating Rhizobium leguminosarum bv. trifolii strain 
WSM597 



Wayne Reeve* 1 , Jason Terpolilli 1 , Vanessa Melino 1 , Julie Ardley 1 , Rui Tian 1 , Sofie De Meyer 1 , 
Ravi Tiwari 1 , Ronald Yates 1,2 , Graham O'Hara 1 , John Howieson 1 , Mohamed Ninawi 1 , 
Brittany Held 3 , David Bruce 3 , Chris Detter 3 , Roxanne Tapia 3 , Cliff Han 3 , Chia-Lin Wei 3 , 
Marcel Huntemann 3 , James Han 3 , l-Min Chen 5 , Konstantinos Mavromatis 3 , Victor 
Markowitz 3 , Natalia Ivanova 3 , Galina Ovchinnikova 3 , loanna Pagan i 3 , Amrita Pati 3 , Lynne 
Goodwin 4 , Tanja Woyke 3 & Nikos Kyrpides 3 . 

1 Centre for Rhizobium Studies, Murdoch University, Western Australia, Australia 

2 Department of Agriculture and Food, Western Australia, Australia 

3 DOE Joint Genome Institute, Walnut Creek, California, USA 

4 Los Alamos National Laboratory, Bioscience Division, Los Alamos, New Mexico, USA 
3 Biological Data Management and Technology Center, Lawrence Berkeley National 

Laboratory, Berkeley, California, USA 

Correspondence: Wayne Reeve (W.Reeve@murdoch.edu.au) 

Keywords: root-nodule bacteria, nitrogen fixation, rhizobia, Alphaproteobacteria 



Rhizobium leguminosarum bv. trifolii strain WSM597 is an aerobic, motile, Gram-negative, non- 
spore-forming rod isolated from a root nodule of the annual clover Trifolium pallidum L. growing 
at Glencoe Research Station near Tacuarembo, Uruguay. This strain is generally ineffective for ni- 
trogen (N) fixation with clovers of Mediterranean, North American and African origin, but is effec- 
tive on the South American perennial clover T. polymorphum Poir. Here we describe the features 
of R. leguminosarum bv. trifolii strain WSM597, together with genome sequence information and 
annotation. The 7,634,384 bp high-quality -draft genome is arranged in 2 scaffolds of 53 contigs, 
contains 7,394 protein-coding genes and 87 RNA-only encoding genes, and is one of 20 rhizobial 
genomes sequenced as part of the DOE Joint Genome Institute 2010 Community Sequencing Pro- 
gram. 



Introduction 



A key factor which limits the productivity of agri- 
cultural systems is the availability of soil nitrogen 
(N). Legumes can overcome soil N limitations by 
forming symbiotic relationships with root nodule 
bacteria (rhizobia). Rhizobia, through their inter- 
action with legumes, are able to reduce atmos- 
pheric dinitrogen (N2} into ammonia, which can 
supply essential N for growth to the plant. In addi- 
tion, much of this fixed N is subsequently released 
into the soil following plant senescence and decay, 
grazing by livestock or human harvest [1], thereby 
increasing soil N content and fertility for subse- 
quent crops. Thus, biological N2 fixation forms a 
vital component of sustainable agriculture as it 
provides a means of ameliorating N-deficient soils 
without the need for industrially synthesized N- 
based fertilizers, the production and application of 



Forage and fodder legumes play an integral role in 
sustainable farming practice, providing feed for 
stock while also enriching soil with available N. 
Worldwide, there are approximately 110 million 
ha of forage and fodder legumes under production 
[3], of which Trifolium spp. (clover) are of key im- 
portance [4]. The bacterial microsymbionts that 
nodulate clovers are Rhizobium leguminosarum 
bv. trifolii. Since Trifolium spp. are geographically 
widely distributed and are also phenologically 
variable (i.e. they may be either annual [e.g. T. 
subterraneum, T. pallidum and T. scutatum] or 
perennial [e.g. T. pratense, T. repens and T. 
polymorphum]), it is rare that a single strain of R. 
leguminosarum bv. trifolii can effectively fix N2 
across a wide diversity of clovers [5]. 



which have significant environmental and eco- 
nomic costs [2]. 
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Rhizobium leguminosarum bv. trifolii strain WSM597 
was isolated from the nodules of Trifolium pallidum, 
which were collected from the INIA Glencoe Re- 
search Station, Uruguay in 1999. WSM597 is able to 
nodulate (Nod + ) and fix (Fix + ) N2 effectively on the 
South American perennial clover Trifolium 
polymorphum. However, while WSM597 is able to 
nodulate Trifolium pallidum and other annual and 
perennial Trifolium spp. of Mediterranean, African 
and North American origin, it is not effective for N2 
fixation on any of these hosts (Yates et al, un- 
published data). Therefore, WSM597 is highly spe- 
cific for effectiveness in symbiosis, as is also evident 
with the recently sequenced South American clover 
microsymbiont R. leguminosarum bv. trifolii 
WSM2304 [6]. Thus, both microsymbionts demon- 
strate that phenological and geographic barriers ex- 
ist for effective nodulation in clover symbioses. As 
this phenotype represents a common challenge to 
managing the legume-rhizobial symbiosis in agricul- 
ture, the genome of WSM597 is a valuable compara- 
tor for genetic studies of nodulation and N2 fixation. 
Here we present a summary classification and a set 
of general features for R. leguminosarum bv. trifolii 
strain WSM597 together with a description of the 
genome sequence and annotation. 

Classification and general features 

R. leguminosarum bv. trifolii strain WSM597 is a 
motile, Gram-negative rod (Figure Left and Cen- 
ter) in the order Rhizobiales of the class 
Alphaproteobacteria. It is fast growing in laborato- 
ry culture, forming colonies within 3-4 days when 
grown on half Lupin Agar {Y2LA) [7] at 28°C. Colo- 
nies on y2LA are white-opaque, slightly domed, 
moderately mucoid with smooth margins (Figure 
1 Right). Minimum Information about the Genome 
Sequence (MIGS) is provided in Table 1. Figure 2 
shows the phylogenetic neighborhood of R. 



leguminosarum bv. trifolii strain WSM597 in a 16S 
rRNA sequence based tree. This strain clusters 
closest to Rhizobium leguminosarum bv. trifolii 
T24 and Rhizobium leguminosarum bv. phaseoli 
RRE6 with 99.9% and 99.8% sequence identity, 
respectively. 

Symbiotaxonomy 

R. leguminosarum bv. trifolii WSM597 nodulates 
(Nod + ) and fixes N2 effectively (Fix + ) with the 
South American perennial clover T. polymorphum. 
However, WSM597 is ineffective on perennial clo- 
vers of North American [T. reflexum and T. 
amabile) and African origin [T. sempilsoum). 
WSM597 is also ineffective on a range of Mediter- 
ranean annuals [T. resupinatum, T. clusii, T. 
michelianum, T. isthmocarpum, T. scutatum, T. 
incarnatum, T. tomentosum), including its host of 
origin T. pallidum and the North American annual 
T. bejariense (Yates, R, pers. comm.). 

Genome sequencing and annotation 
information 

Genome project history 

This organism was selected for sequencing on the 
basis of its environmental and agricultural rele- 
vance to issues in global carbon cycling, alterna- 
tive energy production, and biogeochemical im- 
portance, and is part of the Community Sequenc- 
ing Program at the U.S. Department of Energy, 
Joint Genome Institute (JGI) for projects of rele- 
vance to agency missions. The genome project is 
deposited in the Genomes OnLine Database [25] 
and an improved-high-quality-draft genome se- 
quence in IMG. Sequencing, finishing and annota- 
tion were performed by the JGI. A summary of the 
project information is shown in Table 2. 




Figure 1. Images of Rhizobium leguminosarum bv. trifolii strain WSM597 using scanning (Left) 
and transmission (Center) electron microscopy as well as light microscopy to visualize colony 
morphology on a solid medium (Right). 
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Table 1. Classification and general features of Rhizobium leguminosarum bv. trifolii strain WSM597 according 
to the MIGS recommendations [8]. 



MIGS ID Property 



Term 



Evidence code 



Current classification 



Domain Bacteria 
Phylum Proteobacteria 
Class Alphaproteobacteria 
Order Rhizobiales 
Family Rhizobiaceae 
Genus Rhizobium 



TAS [9] 
TAS [10] 
TAS [11,12] 
TAS [12,13] 
TAS [14,15] 
TAS [14,1 6-19] 



Species Rhizobium leguminosarum bv. trifolii IDA [14,16,19,20] 





Gram stain 


Negative 


IDA 




Cell shape 


Rod 


IDA 




Motility 


Motile 


IDA 




Sporulation 


Non-sporulating 


NAS 




Temperature range 


Mesophile 


NAS 




Optimum temperature 


2 8°C 


NAS 


MIGS-22 


Oxygen requirement 


Aerobic 


NAS 




Carbon source 


Varied 


IDA 




Energy source 


Chemoorganotroph 


NAS 


MIGS-6 


Habitat 


Soil, root nodule on host 


IDA 


MIGS-15 


Biotic relationship 


Free living, symbiotic 


IDA 


MIGS-14 


Pathogenicity 


Non-pathogenic 


NAS 




Biosafety level 


1 


TAS [21] 




Isolation 


Legume root nodule 


IDA 


MIGS-4 


Geographic location 


Tacuarembo, Uruguay 


IDA 


MIGS-5 


Nodule collection date 


1999 


IDA 


MIGS-4.1 


Longitude 


-56 


IDA 


MIGS-4.2 


Latitude 


-31.41 




MIGS-4.3 


Depth 


5 cm soil depth 




MIGS-4.4 


Altitude 


130 m 





Evidence codes - IDA: Inferred from Direct Assay (i.e. first time published); TAS: Traceable Author Statement 
(i.e., a direct report exists in the literature) NAS: Non-traceable Author Statement (i.e., not directly observed for 
the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). 
These evidence codes are from the Gene Ontology project [22]. 
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69 r Rhizobium frop/c/CIAT899 T (EU488752, Gi05744) 



it 



58 



64 



100 



99 



Rhizobium multihospitium CCBAU 83401 (EF035074) 
L Rhizobium miluonense CCBAU 41251 T (EF061 096) 
I — Rhizobium leguminosarum bv. phaseoli RRE6 (AY94601 2) 
Rhizobium leguminosarum bv. trifoiii T24 (U31074) 
Rhizobium leguminosarum bv. trifolii WSNI597 (Gi06486) 



5U 



r Rhizobium p/s/ DSM 301 32' (AY509899) 

Rhizobium phaseoli ATCC 14482 T (EF141 340) 

- Rhizobium fabaelMG 23997 T (DQ835306) 
- Rhizobium etf/CFN 42 T (U28916, Gc00342)* 

Rhizobium tibeticum CCBAU 85039 T (EU256404) 

98 | - Rhizobium alamii LMG 24466 T (AM931436) 

L Rhizobium mesosinicum CCBAU2501 0 T (DQ1 00063) 

CR/j/zo6/ummongo/enseUSDA1844 T (U89817, Gi08900) 
Rhizobium yanglingense SH22623 7 (AF003375) 

Rhizobium tubonense CCBAU 85046 T (EU256434) 

Rhizobium soli DS-aI (EF36371 5) 

Rhizobium loessense CCBAU 7190B T (AF364069) 

— Rhizobium ga/egae gal 1261 T (X67226, Gi09S89) 
98 I Rhizobium wgnaeCCBAU 051 76 T (GU128881) 
97 I Rhizobium larrymoorei LMG 21410 T (NR_026519) 



Rhizobium radiobacter ATCC 19358 (AJ389904) 



Rhizobium vitis ATCC 49767' (X67225, Gi15372) 



lOol — Rhizobium fa/ia/s/?anense CCNWSX0483 1 (HM776997) 
95 I Ensiferkummerowiae CCBAU 71714 T (AF364067) 



99 



S!, 



57 



Ensifer meliloti LMG 6133 T (X67222) 
Ensifermedicae LMG 1 9920 T (L39882) 
Ensifer xinjiangense LMG 17930 T (D12796) 
77 I EnsiferfrediiLMG 621 7 T (X67231) 
Ensiferterangae LMG 7834 T (X68388) 

68 1— Mesorhizobium gobiense LMG 23949 T (EF035064) 

Mesorhizobium loti USDA 3471 T (X67229, Gi08881) 



9D 



73 



100 



100 
54 



i— Mesorhizobium septentrionale SDW 014 T (AF508207) 
Mesorhizobium pluril 'ahum LMG 11892 T (Y14158) 

C Mesorhizobium opportunistum WSM2075 T (CP002279, Gc01853)* 
Mesorhizobium huakuii LMG 14107 T (D1 3431 ) 
Bradyrhizobium elkanii USDA 76 T (AF362942, Gi08850) 
Bradyrhizobium canahense LMG 22265 T (AJ558025) 
Bradyrhizobium liaoningense LMG 1 8230 T (AJ25081 3) 
Bradyrhizobium yuanmingense LMG 21827 T (AF193818) 



Azorhizobium caulinodans ORS 571 T (X67221, Gc00669)* 



0.01 



Figure 2. Phylogenetic tree showing the relationships of Rhizobium leguminosarum bv. trifolii strain WSM597 
(shown in blue print) with some of the root nodule bacteria in the order Rhizobiales based on aligned se- 
quences of the 16S rRNA gene (1,307 bp internal region). All sites were informative and there were no gap- 
containing sites. Phylogenetic analyses were performed using MEGA, version 5.05 [23]. The tree was built 
using the maximum likelihood method with the General Time Reversible model. Bootstrap analysis [24] with 
500 replicates was performed to assess the support of the clusters. Type strains are indicated with a super- 
script T. Strains with a genome sequencing project registered in GOLD [25] are in bold print and the GOLD 
ID is mentioned after the accession number. Published genomes are designated with an asterisk. 
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Table 2. Genome sequencing project information for Rhizobium leguminosarum bv. trifolii strain 
WSM597. 



MICo ID 


Property 


Term 


V A 1 1 — " C 1-1 

MIGS-31 


I - • • 1 • 1 'I. 

Finishing quality 


1 1 1 • 1 1 • i 1 Ci. 

Improved high-quality draft 


MIGS-28 


I • I ' 1 

Libraries used 


ill • a * • 1 « i • i 1 a r a 1 * 1 

lllumina GAn shotgun and paired end 454 libraries 


MIGS-29 


Sequencing platforms 


lllumina GAii and 454 GS FLX Titanium technologies 


MIGS-31 .2 


Sequencing coverage 


7.8x 454 paired end, 764.2 x lllumina 


MIGS-30 


Assemblers 


Velvet 1.0.13, Newbler2.3, phrap 4.24 


MIGS-32 


Gene calling methods 


Prodigal 1.4, GenePRIMP 




GOLD ID 


Gi06486 




NCBI project ID 


65299 




Database: IMG 


2 5092 7602 1 




Project relevance 


Symbiotic N, fixation, agriculture 



Growth conditions and DNA isolation 

Rhizobium leguminosarum bv. trifolii strain 
WSM597 was grown to mid logarithmic phase in 
TY rich medium [26] on a gyratory shaker at 28°C. 
DNA was isolated from 60 mL of cells using a 
CTAB (Cetyl trimethyl ammonium bromide) bac- 
terial genomic DNA isolation method [27]. 

Genome sequencing and assembly 

The genome of Rhizobium leguminosarum bv. 
trifolii strain WSM597 was sequenced at the Joint 
Genome Institute (JGI) using a combination of 
lllumina [28] and 454 technologies [29]. An 
lllumina GAii shotgun library which generated 
73,610,574 reads totaling 5,594.4 Mb, and a 
paired end 454 library with an average insert size 
of 14 Kb which generated 335,966 reads totaling 
93.4 Mb of 454 data were generated for this ge- 
nome. All general aspects of library construction 
and sequencing performed at the JGI can be found 
at the JGI website [30]. The initial draft assembly 
contained 190 contigs in 6 scaffolds. The 454 Ti- 
tanium standard data and the 454 paired end data 
were assembled together with Newbler, version 
2.3-PreRelease-6/30/2009. The Newbler consen- 
sus sequences were computationally shredded 
into 2 Kb overlapping fake reads (shreds), 
lllumina sequencing data were assembled with 
VELVET, version 1.0.13 [31], and the consensus 
sequences were computationally shredded into 
1.5 Kb overlapping fake reads (shreds). The 454 
Newbler consensus shreds, the lllumina VELVET 
consensus shreds and the read pairs in the 454 
paired end library were integrated using parallel 
phrap, version SPS - 4.24 (High Performance 
Software, LLC). The software Consed (Ewing and 
Green 1998; Ewing etal. 1998; Gordon et al. 1998) 
was used in the following finishing process. 



lllumina data was used to correct potential base 
errors and increase consensus quality using the 
software Polisher developed at JGI (Alia Lapidus, 
unpublished). Possible mis-assemblies were cor- 
rected using gapResolution (Cliff Han, un- 
published), Dupfinisher (Han, 2006), or sequenc- 
ing cloned bridging PCR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PCR and by Bubble PCR (J-F 
Cheng unpublished) primer walks. A total of 215 
additional reactions were necessary to close gaps 
and to raise the quality of the finished sequence. 
The estimated genome size is 7.3 Mb and the final 
assembly is based on 57.2 Mb of 454 draft data 
which provides an average 7.8* coverage of the 
genome and 5,578.3 Mb of lllumina draft data 
which provides an average 764.2x coverage of the 
genome. 

Genome annotation 

Genes were identified using Prodigal [32] as part 
of the DOE-JGI Annotation pipeline [33], followed 
by a round of manual curation using the JGI 
GenePRIMP pipeline [34]. The predicted CDSs 
were translated and used to search the National 
Center for Biotechnology Information (NCBI) non- 
redundant database, UniProt, TIGRFam, Pfam, 
PRIAM, KEGG, COG, and InterPro databases. These 
data sources were combined to assert a product 
description for each predicted protein. Non- 
coding genes and miscellaneous features were 
predicted using tRNAscan-SE [35], RNAMMer [36], 
Rfam [37], TMHMM [38], and SignalP [39]. Addi- 
tional gene prediction analyses and functional an- 
notation were performed within the Integrated 
Microbial Genomes (IMG-ER) platform [40]. 
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Genome properties 

The genome is 7,634,384 nucleotides with 61.01% 
GC content (Table 3) in 2 scaffolds containing 53 
contigs. From a total of 7,481 genes, 7,394 were 
protein encoding and 87 RNA only encoding 



genes. The majority of genes (79.24%) were as- 
signed a putative function whilst the remaining 
genes were annotated as hypothetical. The distri- 
bution of genes into COGs functional categories is 
presented in Table 4 and Figure 3. 



Table 3. Genome Statistics for Rhizobium leguminosarum bv. trifolii strain WSM597 . 



Attribute 


Value 


% of Total 


Genome size (bp) 


7,634,384 


100.00 


DNA coding region (bp) 


6,596,806 


86.41 


DNA G+C content (bp) 


4,657,890 


61.01 


Number of scaffolds 


2 




Number of contigs 


53 




Total genes 


7,481 


100.00 


RNA genes 


87 


1.16 


rRNA operons* 


1 




Protein-coding genes 


7, 394 


98.84 


Genes with function prediction 


5,928 


79.24 


Genes assigned to COGs 


5,886 


78.68 


Genes assigned Pfam domains 


6,150 


82.21 


Genes with signal peptides 


634 


8.47 


Genes with transmembrane helices 


1,655 


22.12 


CRISPR repeats 


0 





*1 extra 5s rRNA and 2 extra 16s rRNA genes 
RLH.1 




RLH.2 




IlliJIJI 



fllllllllfllfffllfllllllllflrflllll 




rTTTT 



Figure 3. Graphical map of the two DNA scaffolds of Rhizobium leguminosarum bv. 
trifolii strain WSM597. From outside to the center Genes on forward strand (color by 
COG categories as denoted by the IMG platform), Genes on reverse strand (color by 
COG categories), RNA genes (tRNAs green, sRNAs red, other RNAs black), GC con- 
tent, GC skew. 
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Table 4. Number of protein coding genes of Rhizobium leguminosarum bv. trifolii strain 
WSM597 associated with the general COG functional categories. 


Code 


Value 


%age 


Description 


J 


195 


2.95 


Translation, ribosomal structure and biogenesis 


A 


0 


0.00 


RNA processing and modification 


K 


62 7 


9.50 


Transcription 


L 


233 


3.53 


Replication, recombination and repair 


B 


2 


0.03 


Chromatin structure and dynamics 


D 


44 


0.67 


Cell cycle control, mitosis and meiosis 


Y 


0 


0.00 


Nuclear structure 


V 


73 


1.11 


Defense mechanisms 


T 


375 


5.68 


Signal transduction mechanisms 


M 


333 


5.05 


Cell wall/membrane biogenesis 


N 


108 


1.64 


Cell motility 


Z 


1 


0.02 


Cytoskeleton 


W 


0 


0.00 


Extracellular structures 


u 


107 


1.62 


I ntracel I ular trafficki ng and secretion 


o 


200 


3.03 


Posttranslational modification, protein turnover, chaperones 


c 


351 


5.32 


Energy production conversion 


G 


674 


10.21 


Carbohydrate transport and metabolism 


E 


748 


11.33 


Amino acid transport metabolism 


F 


109 


1.65 


Nucleotide transport and metabolism 


H 


211 


3.20 


Coenzyme transport and metabolism 


I 


242 


3.67 


Lipid transport and metabolism 


P 


297 


4.50 


Inorganic ion transport and metabolism 


Q 


171 


2.59 


Secondary metabolite biosynthesis, transport and catabolism 


R 


850 


12.88 


General function prediction only 


S 


649 


9.83 


Function unknown 




1,595 


21.32 


Not in COGS 
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